Decoupled State-Execute Architecture

نویسندگان

Miquel Pericàs

Adrián Cristal

Rubén González

Alexander V. Veidenbaum

Mateo Valero

چکیده

The majority of register file designs follow one of two well– knownapproaches.Manymodernhigh-performanceprocessors (POWER4 [1], Pentium4 [2]) use a merged register file that holds both architectural and rename registers. Other processors use a Future File (eg, Opteron [3]) with rename registers kept separately in reservation stations. Both approaches have issues thatmay limit their application in futuremicroprocessors. The merged register file scales poorly in terms of powerperformance while the Future File has to pay a large penalty due on branch mis–prediction recovery. In addition, the Future File requires the use of the less scalable mechanism of reservation stations. This paper proposes to combine the best aspects of the traditional Future File architecture with those of the merged physical register file. The key point is that the new architecture separates the processor state, in particular the registers, and the execution units in the pipeline back– end. Therefore it is called Decoupled State-Execute Architecture. The resulting register file can be accessed in the pipeline front–end and has several desirable properties that allow efficient application of several optimizations, most notably the register file banking and a novel writeback filtering mechanism. As a result, only a 1.0% IPC degradation was observed with aggressive banking and the energy consumption was lowered by the new writeback filtering technique. Together, the two optimizations remove approximately 80% of the energy consumed in register file data array.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Decoupled Fetch-Execute Engine with Static Branch Prediction Support

We describe a method for supporting static branch prediction on a decoupled fetch-execute pipeline. Using instruction buffers to decouple instruction fetch from the execute pipeline is an effective way to minimize instruction cache penalties by allowing instruction fetch and stall miss handling to proceed independent of the execution pipeline. Dynamic branch prediction is typically used with su...

متن کامل

The Latency Hiding Effectiveness of Decoupled Access/Execute Processors

Several studies have demonstrated that out-of-order execution processors may not be the most adequate organization for wide issue processors due to the increasing penalties that wire delays will cause in the issue logic. The main target of out-of-order execution is to hide functional unit latencies and memory latency. However, the former can be quite effectively handled at compile time and this...

متن کامل

A Decoupled Federate Architecture for Distributed Simulation Cloning

Distributed simulation cloning technology is designed to perform “what-if” analysis of existing High Level Architecture (HLA) based distributed simulations. The technology aims to enable the examination of alternative scenarios concurrently within the same simulation execution session. State saving and recovery are necessary for cloning a federate at runtime. However it is very difficult to hav...

متن کامل

Microarchitectural Miss/Execute Decoupling

The decoupled access/execute architecture described a machine that enables the access of memory values to be decoupled from the consumption of those values. Although never widely adopted in its original form, the decoupled design is a compelling way to tolerate memory latency. In this paper, we propose and demonstrate a novel implementation of decoupling, one based on the following two refineme...

متن کامل

Code Partitioning in Decoupled Compilers

Decoupled access/execute architectures seek to maximize performance by dividing a given program into two separate instruction streams and executing the streams on independent cooperating processors. The instruction streams consist of those instructions involved in generating memory accesses (the Access stream) and those that consume the data (the Execute stream). If the processor running the ac...

متن کامل

A Decoupled Translate Execute (DTE) Architecture to Improve Performance of Java Execution

Java is increasing in popularity in the software industry , and is being used to implement server and enterprise scale projects. Such applications require high performance , and need to run eeciently. Techniques like JIT compilers and PicoJava chips ooer some speedup, but in this paper we look at alternate techniques in hardware which could be incorporated in future microprocessors. We propose ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Decoupled State-Execute Architecture

نویسندگان

چکیده

منابع مشابه

A Decoupled Fetch-Execute Engine with Static Branch Prediction Support

The Latency Hiding Effectiveness of Decoupled Access/Execute Processors

A Decoupled Federate Architecture for Distributed Simulation Cloning

Microarchitectural Miss/Execute Decoupling

Code Partitioning in Decoupled Compilers

A Decoupled Translate Execute (DTE) Architecture to Improve Performance of Java Execution

عنوان ژورنال:

اشتراک گذاری